<h1>1. Stuff Evaluation</h1>
<p>This page describes the <i>stuff evaluation metrics</i> used by COCO. The evaluation code provided here can be used to obtain results on the publicly available COCO validation set. It computes multiple metrics described below. To obtain results on the COCO test set, for which ground-truth annotations are hidden, generated results must be <a href="#upload">uploaded</a> to the evaluation server. The exact same evaluation code, described below, is used to evaluate results on the test set.</p>
<p>The COCO Stuff Segmentation Task builds on the COCO-Stuff project as described on <a href="https://github.com/nightrome/cocostuff">this website</a> and in this <a href="https://arxiv.org/abs/1612.03716">research paper</a>. This task includes and extends the original dataset release. Please note that in order to scale annotation, stuff segmentations were collected on superpixel segmentations of an image.</p>

<h1>2. Classes</h1>
<p>The stuff labels are organized in a <a href="images/stuff-labelhierarchy.png">semantic hierarchy</a>. Additionally there is a single 'other' category and a supercategory of the same name that subsumes all the thing categories in COCO. For compatibility with the COCO thing categories we use the following category ids:</p>
<div class="json fontMono">
  <div class="jsonk">1-91</div><div class="jsonv">% thing categories (not used for stuff segmentation)</div>
  <div class="jsonk">92-182</div><div class="jsonv">% stuff categories</div>
  <div class="jsonk">183</div><div class="jsonv">% other category (all thing pixels)</div>
</div>
<p>Notes: every pixel has at most one label (but not all pixels have a label). Unassigned pixels are ignored during evaluation (when stored as an image, they are are given label 0). The equal number of thing and stuff categories is a coincidence.</p>

<h1>3. Metrics</h1>
<p>The following metrics are used for characterizing the performance of stuff segmentation methods on COCO. The same metrics are used for leaf node evaluation (e.g. grass, wall-stone, other), as well as for supercategories (e.g. plant, wall, other). The challenge winner will be determined based on the highest Mean IoU over leaf nodes. An overview of these metrics can be found in [<a href="https://arxiv.org/abs/1605.06211" target="_blank">Shelhamer PAMI 2016</a>].</p>
<div class="json fontMono">
  <div class="jsonreg"><b>Leaf category metrics:</b></div>
  <div class="jsonk">mIoU</div><div class="jsonv">% Mean IoU <b>(primary challenge metric)</b></div>
  <div class="jsonk">fIoU</div><div class="jsonv">% Frequency Weighted IoU</div>
  <div class="jsonk">mAcc</div><div class="jsonv">% Mean Accuracy</div>
  <div class="jsonk">pAcc</div><div class="jsonv">% Pixel Accuracy</div>
  <div class="jsonreg"><b>Supercategory metrics:</b></div>
  <div class="jsonk">mIoU<sup>S</sup></div><div class="jsonv">% Mean IoU</div>
  <div class="jsonk">fIoU<sup>S</sup></div><div class="jsonv">% Frequency Weighted IoU</div>
  <div class="jsonk">mAcc<sup>S</sup></div><div class="jsonv">% Mean Accuracy</div>
  <div class="jsonk">pAcc<sup>S</sup></div><div class="jsonv">% Pixel Accuracy</div>
</div><br/>
<ol class="fontSmall">
  <li>Intersection Over Union (IoU) is computed as IoU=TP/(TP+FP+FN), where TP=true positives, FP=false positives, and FN=false negatives.</li>
  <li>Mean IoU (mIoU) is the average over the IoUs of each class.</li>
  <li>Frequency Weighted IoU (fIoU) weights every class proportional to the number of pixels for that class.</li>
  <li>Mean Accuracy (mAcc) denotes the fraction of correct pixels per class averaged over classes.</li>
  <li>Pixel Accuracy (pAcc) denotes the fraction of correct pixels.</li>
  <li>Note that metrics are computed on the <i>pixels</i> in each image, <i>not on object instances</i>.</li>
  <li>All metrics are computed only on valid pixels (91 stuff classes + 1 other class). Unlabeled pixels are not considered.</li>
  <li>If there are no ground-truth pixels for a class in the validation/test set, that class is removed from consideration.</li>
</ol>

<h1>4. Evaluation Code</h1>
<p><b>Stuff evaluation code is now available</b> on the <a href="https://github.com/nightrome/coco" target="_blank">COCO Stuff Github</a> in Python and Matlab. Before running the evaluation code, please prepare your results in the format described on the <a href="#format-results">results format</a> page.</p>
<p>Here is <a href="https://github.com/nightrome/coco" target="_blank">an overview of all relevant files</a>. Specifically, see <a href="https://github.com/nightrome/coco/blob/master/PythonAPI/cocostuff/cocoStuffDemo.py" target="_blank">cocoStuffDemo.py</a> for a demo that visualizes stuff annotations and <a href="https://github.com/nightrome/coco/blob/master/PythonAPI/cocostuff/cocoStuffEvalDemo.py" target="_blank">cocoStuffEvalDemo.py</a> for a demo that evaluates segmentation results. Running the evaluation code via calls to <span class="fontMono">evaluate()</span> produces the a struct <span class="fontMono">eval</span>, which stores the parameters, a timestamp and the confusion matrix. Finally <span class="fontMono">summarize()</span> computes the segmentation metrics defined earlier based on the <span class="fontMono">eval</span> struct and stores them in <span class="fontMono">stats</span> and <span class="fontMono">statsClass</span>. For more details see <a href="https://github.com/nightrome/coco/blob/master/PythonAPI/pycocotools/cocostuffeval.py" target="_blank">cocostuffeval.py</a>. For convenience we provide scripts that convert <a href="https://github.com/nightrome/coco/blob/master/PythonAPI/cocostuff/cocoSegmentationToPngDemo.py" target="_blank">COCO stuff annotations to .png images</a> and <a href="https://github.com/nightrome/coco/blob/master/PythonAPI/cocostuff/pngToCocoResultDemo.py" target="_blank">.png images to COCO results</a>. These can be used both for visualization and compatibility with existing semantic segmentation code.</p>
